Skip to content

Conversation

@alex-w-99
Copy link
Contributor

@alex-w-99 alex-w-99 commented Jan 15, 2026

Guide agent scaffolding

Adds a conversational AI agent that helps users define and create web automation routines through natural language interaction.

What's new

Guide agent (web_hacker/agents/guide_agent/)

  • Conversational assistant that gathers task requirements (what to automate, parameters, expected output, target website) and initiates routine discovery
  • Tool-calling support with user confirmation flow before execution
  • Streaming response support for real-time output

Unified LLM client (web_hacker/llms/)

  • Facade client supporting both OpenAI (GPT-5 series) and Anthropic (Claude 4.5 series) models
  • Vendor-specific implementations with async support
  • Tool/function registration via register_tool_from_function() with automatic schema generation from type hints and docstrings
  • Structured output support using Pydantic models

Chat data models (web_hacker/data_models/chat.py)

  • Chat and ChatThread models for conversation persistence
  • PendingToolInvocation for tracking tool calls awaiting user confirmation
  • LLMChatResponse and streaming response types

CLI and scripts

  • Interactive terminal CLI (web_hacker/scripts/run_guide_agent.py) for local testing
  • Root-level script (scripts/run_guide_agent.py) as alternate entry point

Other changes

  • Added ResourceBase for consistent ID and timestamp handling across models
  • Added UnknownToolError exception
  • Fixed deprecated datetime.utcnow() usage with timezone-aware alternative
  • Added unit tests for tool utility functions

Next steps

  • Pass the guide agent the CDP captures dir/path
  • Implement the "invoke discovery agent" tool, used by the guide agent

alex-w-99 and others added 30 commits January 15, 2026 19:22
- Add StartRoutineDiscoveryJobCreationParams Pydantic model for tool schema
- Add data_models/guide_agent/ with conversation state and message types
- Add data_models/websockets/ with base WS types and guide-specific commands/responses
- Update GuideAgent with callback pattern, tool confirmation flow, state management
- Business logic stubs marked with NotImplementedError for subsequent PR

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Move all WebSocket types (base, browser, guide) into one consolidated
websockets.py file. Also move test_websockets.py from servers repo.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace Pydantic model + constants with a simple function stub
that colleague will implement. Guide agent now uses
register_tool_from_function and calls the function directly.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add tool_utils.py with extract_description_from_docstring and
  generate_parameters_schema for converting Python functions to
  LLM tool definitions using pydantic TypeAdapter
- Add register_tool_from_function method to LLMClient that extracts
  name, description, and parameters schema from a function
- Add unit tests for tool_utils

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Merge GuideWebSocketClientCommandType into WebSocketClientCommandType
- Merge ParsedGuideWebSocketClientCommand into ParsedWebSocketClientCommand
- Remove Guide- prefix from response types (WebSocketMessageResponse, etc.)
- Consolidate response type enums (MESSAGE, STATE, TOOL_INVOCATION_RESULT)
- Add tests for all previously untested models and commands
- Increase test coverage from ~50% to 100% of websockets module

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…LLM API

- Add Chat and ChatThread models extending ResourceBase with bidirectional linking
- Rename duplicate ChatMessage to EmittedChatMessage for callback messages
- Add LLMToolCall and LLMChatResponse models for tool calling support
- Implement GuideAgent with conversation logic, persistence callbacks, and
  self-aware system prompt for web automation routine creation
- Update all LLM client methods to accept messages array instead of single prompt
  (get_text_sync/async, get_structured_response_sync/async, chat_sync)
- Add run_guide_agent.py terminal chat script

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replaces stub script with full terminal interface featuring ANSI colors,
ASCII banner, tool invocation confirmation flow, and conversation commands.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Update welcome message to describe CDP capture analysis workflow
- Add links to Vectorly docs and console
- Change banner color to purple
- Fix OpenAI client to use max_completion_tokens for GPT-5 models

Co-Authored-By: Claude Opus 4.5 <[email protected]>
GPT-5 models only support temperature=1 (default), so we omit the
parameter entirely to avoid API errors.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add chat_stream_sync method to abstract, OpenAI, and Anthropic clients
- Add stream_chunk_callable parameter to GuideAgent
- Update terminal CLI to print chunks as they arrive for typewriter effect

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add STREAM_CHUNK and STREAM_END to WebSocketStreamResponseType
- Add WebSocketStreamChunkResponse for text deltas during streaming
- Add WebSocketStreamEndResponse with full accumulated content
- Update WebSocketServerResponse union to include new types

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update tests to use thread_id instead of guide_chat_id to match
the WebSocketStateResponse model change.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@alex-w-99 alex-w-99 marked this pull request as ready for review January 16, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants